The complexity of Shortest Common Supersequence for inputs with no identical consecutive letters
نویسندگان
چکیده
The Shortest Common Supersequence problem (SCS for short) consists in finding a shortest common supersequence of a finite set of words on a fixed alphabet Σ. It is well-known that its decision version denoted [SR8] in [3] is NP-complete. Many variants have been studied in the literature. In this paper we settle the complexity of two such variants of SCS where inputs do not contain identical consecutive letters. We prove that those variants denoted φSCS and MSCS both have a decision version which remains NP-complete when |Σ| ≥ 3. Note that it was known for MSCS when |Σ| ≥ 4 [2] and we discuss how [1] states a similar result for |Σ| ≥ 3. The two problems we study are precisely described below. In this paper, we will use the following terminology. Given two words over an alphabet Σ, u = u1 . . . up (ui ∈ Σ) and v = v1 . . . vq (vi ∈ Σ), an embedding of u into v is an injection f from {1, . . . , p} into {1, . . . , q} such that ui = vf(i). It tells that v is a supersequence of u and we also say that f maps letters of u onto letters of v. We will also use equivalently the terms pattern, block or factor to designate a sequence of consecutive letters in a word. A supersequence for a set of words is a word which is a supersequence for each of those words. To define the first problem, we use the notations Σ2 = {0, 1} and Σ3 = {0, 1, 2}, and we define the word morphism φ : Σ2 → Σ3 by φ(0) = 0202 and φ(1) = 1. Shortest Common Supersequence for some inputs generated by φ decision version φSCS Input: A set L = {w1, . . . , wn} of words on the alphabet Σ3 such that L ⊆ φ(Σ∗), each wi contains exactly two ones, which moreover are non consecutive, and an integer k. Output: Does there exist a supersequence of L of size less than k? The second variant has been named Modified SCS in [2]. Modified Shortest Common Supersequence (MSCS) decision version Input: A set L = {w1, . . . , wn} of words on an alphabet Σ = {a1, a2, . . . , ad} such that no word wi contains two consecutive identical letters and no word wi starts with letter a1, and an integer k. Output: Does there exist a supersequence of L of size less than k? A careful look at those two problems shows that φSCS is a particular case of MSCS if |Σ| ≥ 3. The input words for φSCS are a concatenation of patterns 0202 and 1 with no consecutive ones, thus they do not contain consecutive identical letters. Moreover none of those input words starts ∗LIP, Université de Lyon, ENS Lyon, CNRS UMR5668, INRIA, UCB Lyon 1, France
منابع مشابه
On Finding Minimal, Maximal, and Consistent Sequences over a Binary Alphabet
In this paper we investigate the complexity of finding various kinds of common superand subsequences with respect to one or two given sets of strings. We show that Longest Minimal Common Supersequence, Shortest Maximal Common Subsequence, and Shortest Maximal Common Non-Supersequence are MAX SNP-hard over a binary alphabet. Moreover, we show that Shortest Common Supersequence, Longest Common Su...
متن کاملAnalogs & Duals of the MAST Problem
Two natural kinds of problems about \structured collections of symbols" can be generally refered to as the Largest Common Sub-object and the Smallest Common Superobject problems, which we consider here as the dual problems of interest. For the case of rooted binary trees where the symbols occur as leaf-labels and a subobject is deened by label-respecting hereditary topological containment, both...
متن کاملProblems Related to Subsequences and Supersequences
We present an algorithm for building the automaton that searches for all non-overlapping occurrences of each subsequence from the set of subsequences. Further, we define Directed Acyclic Supersequence Graph and use it to solve the generalized Shortest Common Supersequence problem, the Longest Common Non-Supersequence problem, and the Longest Consistent Supersequence problem.
متن کاملConsistent Supersequences and Transversal Graphs an Extended Abstract
Motivation A consistent supersequence is a common supersequence of the set of positive strings and a common nonsupersequence of the set of negative strings Di erent problems related to consistent supersequences nd applications in molecular biology learning theory data compression manufacturing systems design and draw attention due to their attractive combinatorial structure and challenging comp...
متن کاملParameterized Complexity and Biopolymer Sequence Comparison
The article surveys parameterized algorithms and complexities for computational tasks on biopolymer sequences, including the problems of longest common subsequence, shortest common supersequence, pairwise sequence alignment, multiple sequencing alignment, structure-sequence alignment, and structure-structure alignment. Algorithm techniques built on the structural-unit level, as well as on the r...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1309.0422 شماره
صفحات -
تاریخ انتشار 2013